The Robustness of Domain Lexico-Taxonomy: Expanding Domain Lexicon with CiLin

نویسندگان

  • Chu-Ren Huang
  • Xiang-Bing Li
  • Jia-Fei Hong
چکیده

This paper deals with the robust expansion of Domain LexicoTaxonomy (DLT). DLT is a domain taxonomy enriched with domain lexica. DLT was proposed as an infrastructure for crossing domain barriers (Huang et al. 2004). The DLT proposal is based on the observation that domain lexica contain entries that are also part of a general lexicon. Hence, when entries of a general lexicon are marked with their associated domain attributes, this information can have two important applications. First, the DLT will serve as seeds for domain lexica. Second, the DLT offers the most reliable evidence for deciding the domain of a new text since these lexical clues belong to the general lexicon and do occur reliably in all texts. Hence general lexicon lemmas are extracted to populate domain lexica, which are situated in domain taxonomy. Based on this previous work, we show in this paper that the original DLT can be further expanded when a new language resource is introduced. We applied CiLin, a Chinese thesaurus, and added more than 1000 new entries for DLT and show with evaluation that the DLT approach is robust since the size and number of domain lexica increased effectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Domain Lexico-Taxonomy: An Approach Towards Multi-domain Language Processing

This paper deals with the domain barrier issues in language processing. Our work centers on Domain Lexico-Taxonomy (DLT), a domain taxonomy enhanced by domain lexicons. We propose DLT as an infrastructure for crossing domain barriers. By using DLT with WordNet and Domain Taxonomy, we can get 15160 Chinese lemmas in 463 domains. We estimate the accuracy of five domain’s lemmas, and get 89.74% in...

متن کامل

Objects Identification in Object-Oriented Software Development - A Taxonomy and Survey on Techniques

Analysis and design of object oriented is onemodern paradigms for developing a system. In this paradigm, there are several objects and each object plays some specific roles. Identifying objects (and classes) is one of the most important steps in the object-oriented paradigm. This paper makes a literature review over techniques to identify objects and then presents six taxonomies for them. The f...

متن کامل

Semantic Atomicity and Multilinguality in the Medical Domain: Design Considerations for the MorphoSaurus Subword Lexicon

We present the lexico-semantic foundations underlying a multilingual lexicon the entries of which are constituted by so-called subwords. These subwords reflect semantic atomicity constraints in the medical domain which diverge from canonical lexicological understanding in NLP. We focus here on criteria to identify and delimit reasonable subword units, to group them into functionally adequate sy...

متن کامل

Content Evaluation of Iranian EFL Textbook Vision 1 Based on Bloom’s Revised Taxonomy of Cognitive Domain

Textbooks are considered as the common features of the classrooms and are important means to make contributions to curricula. Therefore, their contents are very essential to develop the adequate curriculum planning. A textbook analysis is a means by which different features of the textbooks can be analyzed and hence their effectiveness is validated. This study set out to evaluate the content of...

متن کامل

Generating a Resource for Products and Brandnames Recognition. Application to the Cosmetic Domain

Named Entity Recognition task needs high-quality and large-scale resources. In this paper, we present RENCO, a based-rules system focused on the recognition of entities in the Cosmetic domain (brandnames, product names, ...). RENCO has two main objectives: 1) Generating resources for named entity recognition; 2) Mining new named entities relying on the previous generated resources. In order to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005